{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Introduction to Machine Learning\n",
        "\n",
        "In this notebook we'll cover:  \n",
        "- What is Machine Learning?\n",
        "- What are the steps that need to be performed?\n",
        "- 'Hello ML.NET World' - training your first ML.NET model. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## What is Machine Learning?\n",
        "\n",
        "\n",
        "```\n",
        "var size = new HouseData() { Size = 2.5F };\n",
        "var price = predictionEngine.Predict(size);\n",
        "```\n",
        "\n",
        "The above code shows how to consume a model that's already been trained. The end result of training a model is a function you can pass some data `HouseData.Size`  to the model and it will give you back a prediction - `Prediction.Price`. \n",
        "\n",
        "The above is a simple example (probably too simple) but models can take in many more values.  For instance - [Value Prediction/Regression with Taxi Dataset](https://ntbk.io/ml-e2e-taxi) -- \n",
        "is a more complex example that takes in `vendor_id`,  `rate_code`, `passenger_count`, `trip_time_in_secs`, `trip_distance`, `payment_type` and then predicts `fare_amount`.\n",
        "\n",
        "### How do you create that function?\n",
        "Machine Learning creates or trains that model by feeding an algorithm data and using statistics to predict the values. More details and example below!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## What are the steps that needed to be performed?\n",
        "\n",
        "1. **Source and Prepare Training Data**  \n",
        "  To train the model we need labeled data.  Labeled data just means a bunch of data that has the column to predict already in the dataset so that the training algorithm can learn to predict the values.\n",
        "1. **Pick Training Algorithm and Train**  \n",
        "    >**Spoiler Alert**  \n",
        "    >Through out most of our examples we'll be using AutoML to simplify this process. AutoML strategically tries out various algorithms and parameters for a given Task to find out which will work best for your data!  \n",
        "    >  \n",
        "    >You can think of this as a fancy for-loop to just try all the options. Our AutoML is a bit smarter than this ... but that is essentially what it does!\n",
        "    >\n",
        "    > For the example below we'll train a specific algorithm - so you can see how that works!\n",
        "    1. Pick a Task - [ML.NET Tasks](https://docs.microsoft.com/dotnet/machine-learning/resources/tasks)\n",
        "    1. Pick an Algorithm - [ML.NET Algorithms](https://docs.microsoft.com/dotnet/machine-learning/how-to-choose-an-ml-net-algorithm)\n",
        "    1. Set Algorithm Parameters [Glossary - Hyperparameters](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary#hyperparameter)\n",
        "    1.  Train -\n",
        "        This is where the data actually gets fed to the algorithm to train the model. This can take sometime depending on the amount of data, algorithm, and the parameters for that algorithm.\n",
        "\n",
        "1. **Evaluate**  \n",
        "  Once you've trained a model - how do you know it works? There are a bunch of techniques to evaluate your models performance. If you'd like to take a deeper dive - Checkout [Evaluation Metrics](https://docs.microsoft.com/dotnet/machine-learning/resources/metrics). Otherwise we'll give examples througout these tutorials.\n",
        "1. **Deploy**  \n",
        "  After you've trained a model ... it's just .NET Code! Build it Ship it - however you currently deploy your application."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## How do I get started?\n",
        "Below we have a quick introduction to ML.NET - \"Hello ML.NET World!\" and the next three Notebooks in the series take a deep dive into [Data Prep and Feature Engineering](https://ntbk.io/ml-02-data), [Training and AutoML](https://ntbk.io/ml-03-training), and [Model Evaluation](https://ntbk.io/ml-04-evaluation)  "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Hello ML.NET World!\n",
        "The code in the following snippet demonstrates the simplest ML.NET application. This example constructs a linear regression model to predict house prices using house size and price data. \n",
        "\n",
        "First step is to reference the [Microsoft.ML](https://www.nuget.org/packages/Microsoft.ML/) package.\n",
        "\n",
        "Regarding this notebook, we add the reference to the package reference as follow:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><div></div><div></div><div><strong>Installed Packages</strong><ul><li><span>Microsoft.ML, 2.0.0-preview.22356.1</span></li></ul></div></div>"
            ]
          },
          "execution_count": 1,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "#i \"nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json\"\n",
        "#r \"nuget: Microsoft.ML, 2.0.0-preview.22356.1\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The second step is to reference the ML.NET namespaces:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [],
      "source": [
        "using Microsoft.ML;\n",
        "using Microsoft.ML.Data;"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we are ready to write the code to achieve the machine learning task we need to do. Always start with creating the [MLContext](https://docs.microsoft.com/dotnet/api/microsoft.ml.mlcontext?view=ml-dotnet) which is the common context for all ML.NET operations"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [],
      "source": [
        "MLContext mlContext = new MLContext();"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Next step is to define the data structures for the data we are going to use. This sample is about house prediction prices. Start defining the following data structure which contains the house size and price:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [],
      "source": [
        "public class HouseData\n",
        "{\n",
        "    public float Size { get; set; }\n",
        "    public float Price { get; set; }\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Then define the house price prediction data structure"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [],
      "source": [
        "public class Prediction\n",
        "{\n",
        "    [ColumnName(\"Score\")]\n",
        "    public float Price { get; set; }\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we are ready to train the pre-collected data we'll use for the house price prediction scenario"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [],
      "source": [
        "HouseData[] houseData = {\n",
        "    new HouseData() { Size = 1.1F, Price = 1.2F },\n",
        "    new HouseData() { Size = 1.9F, Price = 2.3F },\n",
        "    new HouseData() { Size = 2.8F, Price = 3.0F },\n",
        "    new HouseData() { Size = 3.4F, Price = 3.7F } };"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Using the `MLContext` we previously created, load the training data into ML.NET [IDataView](https://docs.microsoft.com/dotnet/api/microsoft.ml.idataview?view=ml-dotnet) which is the fundamental ML.NET data type"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [],
      "source": [
        "IDataView trainingData = mlContext.Data.LoadFromEnumerable(houseData);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we have the data ready, next we'll create the ML.NET pipeline specifying the trainer we are going to use to build our machine learning model. For house price prediction, we are going to use the regression trainer. ML.NET supports other machine learning trainers which can be used for other scenarios as needed. The pipeline will create what is called [Estimator](https://docs.microsoft.com/dotnet/api/microsoft.ml.iestimator-1?view=ml-dotnet) which used to define the operations applied to the data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [],
      "source": [
        "// 2. Specify data preparation and model training pipeline\n",
        "var pipeline = mlContext.Transforms.Concatenate(\"Features\", new[] { \"Size\" })\n",
        "               .Append(mlContext.Regression.Trainers.Sdca(labelColumnName: \"Price\", maximumNumberOfIterations: 100));"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "After creating the estimator, we are ready to apply the transformations and trainer defined in the pipeline to the data. To do that, call the [Fit](https://docs.microsoft.com/dotnet/api/microsoft.ml.iestimator-1.fit?view=ml-dotnet) method."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [],
      "source": [
        "var model = pipeline.Fit(trainingData);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we can evaluate the trained model. The way to do that is by loading a prepared test data and then calling the [Evaluate](https://docs.microsoft.com/dotnet/api/microsoft.ml.regressioncatalog.evaluate?view=ml-dotnet) method, then printing the [Coefficient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination) to find out how the model is fitted using the test data. The closer the Coefficient of determination to 1 is better-fitted model. Repeat the training and evaluation steps till getting a satisfactory result from the trained model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "Coefficient of determination for the trained model: 0.98\r\n"
            ]
          },
          "execution_count": 1,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "HouseData[] testData = {\n",
        "    new HouseData() { Size = 1.1F, Price = 1.2F },\n",
        "    new HouseData() { Size = 1.2F, Price = 1.5F },\n",
        "    new HouseData() { Size = 1.4F, Price = 1.7F },\n",
        "    new HouseData() { Size = 1.6F, Price = 1.9F },\n",
        "    new HouseData() { Size = 1.9F, Price = 2.3F },\n",
        "    new HouseData() { Size = 2.8F, Price = 3.0F },\n",
        "    new HouseData() { Size = 3.2F, Price = 3.5F },\n",
        "    new HouseData() { Size = 3.3F, Price = 3.6F },\n",
        "    new HouseData() { Size = 3.5F, Price = 3.9F }, \n",
        "    new HouseData() { Size = 3.7F, Price = 4.3F }, \n",
        "    new HouseData() { Size = 4.0F, Price = 6.1F }, \n",
        "    new HouseData() { Size = 5.0F, Price = 7.2F }, \n",
        "    new HouseData() { Size = 6.0F, Price = 8.5F }, \n",
        "    new HouseData() { Size = 7.0F, Price = 9.8F }, \n",
        "    new HouseData() { Size = 8.0F, Price = 10.3F }, \n",
        "};\n",
        "\n",
        "// Load the test data\n",
        "IDataView trainingTestData = mlContext.Data.LoadFromEnumerable(testData);\n",
        "\n",
        "// transform the test data using the model\n",
        "IDataView transformedTestData = model.Transform(trainingTestData);\n",
        "\n",
        "// Evaluate the model against the test data\n",
        "RegressionMetrics trainedModelMetrics = mlContext.Regression.Evaluate(transformedTestData, labelColumnName: \"Size\");\n",
        "\n",
        "// Print the R-Squared value. The Closer to 1 indicates a better fitted model.\n",
        "Console.WriteLine($\"Coefficient of determination for the trained model: {trainedModelMetrics.RSquared:0.00}\");"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we have the trained model ready for prediction. Let's use this model to predict a sample house price. We do that by creating the the prediction engine [PredictionEngine<TSrc,TDst>](https://docs.microsoft.com/dotnet/api/microsoft.ml.predictionengine-2?view=ml-dotnet). The prediction engine is the class for making single predictions on a previously trained model (and preceding transform pipeline). Creation of the prediction engine from the trained model can be done by the following code:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [],
      "source": [
        "var predictionEngine = mlContext.Model.CreatePredictionEngine<HouseData, Prediction>(model);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Then using the created prediction engine we can predict the house price as follows:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "dotnet_interactive": {
          "language": "csharp"
        },
        "vscode": {
          "languageId": "csharp"
        }
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "Predicted price for size: 2500 sq ft= $274.48k\r\n"
            ]
          },
          "execution_count": 1,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "var size = new HouseData() { Size = 2.5F };\n",
        "var price = predictionEngine.Predict(size);\n",
        "Console.WriteLine($\"Predicted price for size: {size.Size*1000} sq ft= {price.Price*100:C}k\");"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Congrats! You have successfully trained an ML.NET regression model using your own data, then used this model to predict the house prices. Here is a diagram summarizing the end-to-end operation of creation and training ML.NET model then using it to predict the house prices.\n",
        "\n",
        "![](https://docs.microsoft.com/dotnet/machine-learning/media/mldotnet-annotated-workflow.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Continue learning\n",
        "\n",
        "> [⏩ Next Module - Data Prep and Feature Engineering](./02-Data%20Preparation%20and%20Feature%20Engineering.ipynb)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": ".NET (C#)",
      "language": "C#",
      "name": ".net-csharp"
    },
    "language_info": {
      "file_extension": ".cs",
      "mimetype": "text/x-csharp",
      "name": "C#",
      "pygments_lexer": "csharp",
      "version": "8.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}
